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Abstract: A non parametric diffusion model with an additive fractional Brownian mo- 
tion noise is considered in this work. The drift is a non parametric function that will be 
estimated by two methods. On one hand we propose a locally linear estimator based on 
the local approximation of the drift by a linear function. On the other hand a Nadaraya- 
Watson kernel type estimator is studied. In both cases, some non asymptotic results 
are proposed by means of deviation probability bound. The consistency property of the 
estimators are obtained under a one sided dissipative Lipschitz condition on the drift 
that insures the ergodic property for the stochastic differential equation. Our estimators 
are first constructed under continuous observations. The drift function is then estimated 
with discrete time observations that is of the most importance for practical applications. 
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1. Introduction 

The inference problem for diffusion process is now a well understood problem. The inference 
based on discretely observed diffusion is very important from a practical point of view and 
it has also benefited from numerous studies. With the development of technology, differential 
equations driven by noise with memory is increasingly popular in the statistical community as 
a modeling device. The subject of this work concerns the nonparametric estimation problem 
of the drift coefficient of a fractional diffusion described by the scalar equation 



X t = x + [ b(X s )ds + B? , t>0 (1) 
Jo 



where xq £ R is the initial value of the process X = (Xt)t>o, and B H = (B^)t>o is a fractional 
Brownian motion (fBm in short) with Hurst parameter H £ (0, 1). This means that B H is a 
Gaussian process, centered, starting from and such that E(i?/^ — B^) 2 = \t — s\ 2H . Therefore 
the process B H has f)— Holder continuous paths for all rj £ (0,H). If H = 1/2, then B H is 
clearly a Brownian motion and we refer to [23, Chapter 5] for a survey about the fBm. 

Stochastic differential equations driven by fBm have recently carried out a lot of develop- 
ment. The special case of a constant diffusion coefficient is more specifically treated in [24] 
where it is proved that Equation (1) has a strong unique solution if we assume the linear 
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growth condition \b(x)\ < c&(l + |x|) for b when H < 1/2, and Holder continuity of order 
b £ (1 — 1/2H, 1) when H > 1/2. In this paper, we assume that these conditions are true. 
If we suppose that the observation process is 



with some unknown diffusion coefficient a, then one may estimate the unknown Hurst param- 
eter H and the diffusion coefficient a via the quadratic variation (see [1, 4, 13]). This is the 
reason why we restrict ourselves to the case a = 1 and H known. 

Almost all the existing articles relate to the parametric case when one consider the model 



with is the unknown parameter. Let us briefly review the works that have been already done. 
When the drift is linear, X is the fractional Ornstein-Uhlenbeck process and the estimation 
of 9 has attracted a lot of attention. This problem has been first tackled by [14] using a 
maximum likelihood procedure. Some least quare estimates are proposed in [11, 2]. See also 
[26] for other methods. When b is not necessary linear, the pioneering work is [30] (see also the 
extended electronic version [31]). The maximum likelihood estimaror of 6 is studied both with 
continuous and discrete observations. A moment matching estimation is done in [25] and let us 
finally mention that a general discrete data maximum likelihood is proposed in [3] and a least 
square method is studied in [21] for the parametric estimation problem for model described 
by the equation (2). 

To our knowledge, there is only one paper dealing with non parametric estimation. In [20], 
the authors consider the model 



and proposed a kernel type estimator of the trend coefficient b s := b(x s ) where (x s ) s >o is the 
solution of Equation (3) when e = 0. The asymptotic behavior is discussed when e — > on a 
finite time horizon when H > 1/2. 

Our problem is of different nature than the previous ones. We will investigate two procedures 
to estimate the unknown function b at a fixed point x € R, i.e. b{x). We start with estimators 
based on continuous observation of X. Then, using a discretization, we propose estimators 
based on discrete time data which are the most important for practical applications. It is 
difficult to work directly with the fBm B H because it is not a semimartingale. So we use the 
fundamental martingale (so called in [14], see also [22]) that will have nicer properties. Then 
some simple and classical ideas lead us to the construction of two estimators of b(x). First we 
consider that the drift is a constant function in a neighborhood of the point x and thus the 
problem becomes parametric. The form of the least square estimator in this parametric case 
is used to propose a kernel type estimator of Nadaraya- Watson type (see (16) in Definition 
1). If the drift is assumed to be linear in a small vicinity of x, then by similar arguments we 
define a locally linear estimator of the drift function in the point x (see (40) in Definition 2). 
These local linear smoothers are known to avoid some undesirable edge effects. 

In order to study our these two estimators, we apply the same strategy. We prove some 
deviation probability bounds using non-asymptotic approach. Our probability bounds are 





(2) 




(3) 
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stated conditionally to a random set. This method has been employed in [29] for a diffusion 
process with drift and variance given as nonparametric functions of the state variable. So a 
first step will consist to mainly focus ourselves on a non-asymptotic approach for which there 
is no difference between the ergodic and non-ergodic cases. Similar probability bounds are 
valid for the discretization of our estimators. 

Then we investigate the consistency. With kernel type estimators, it is clear that it is 
necessary to impose some conditions which provide that the observed process (Xt)t>o returns 
to any vicinity of the point x infinitely many times. The ergodicity can guarantee this property 
in the classical Brownian case (see [16]). The null recurrence of X can also be invoked when 
H = 1/2 as in [19]. We refer to [18] for the case of Harris recurrent diffusion. In the fractional 
case such ergodic properties will hold under the assumption that the drift has polynomial 
growth and satisfies a one-sided dissipative Lispchitz condition (see [7] for instance). Starting 
from our conditional deviation probability bound, we shall prove that the probability of the 
random event with respect to which the results are stated converges to 1 under the cited above 
hypotheses on the drift b. Thus the weak consistency (this mean that the convergence holds 
in probability) is proved for both estimators, for continuous and discrete observations. 

The most important results of our paper are certainly the ones concerning the problem of the 
estimation of the unknown value of the drift b in a fixed point x under discrete observations of 
the process X. For simplicity, we describe the Nadaraya- Watson estimator. For equally spaced 
observation times {ifc}o<fc<n, we denote e n the mesh size defined by e n = t^+i—tk = cn~ a with 
a positive constant c and a G (0,1). We may also observe that n = tZ (up to a multiplicative 
constant) with 7 = 1/(1 — 0) > 1. The Nadaraya- Watson estimator of b(x) with the bandwidth 
h is defined at time t n by 

71-1 

w = ^ 

E(*»-**) 1 - 2ff ^(^)(*w.i-**) 

k=0 

where the kernel N is a positive regular function with support in [— 1, + 1], It is obtained via 
a discretization of the continuous version of the estimator (see Definition 1). Some deviation 
probability bounds are proved for the continuous and the discrete version of the Nadaraya- 
Watson estimator in Theorem 2. If we assume that the drift has polynomial growth of order 
m and satisfies a one-sided dissipative Lipschitz, then we obtain ergodic properties for the 
solution of (1) (see Proposition 1). This means that there exists a random variable X such 
that the solution of Equation (1) converges for t — > 00 to the stationary and ergodic process 
(Xt)t>o = (X (6f(uj)))t>o where 6t is the appropriate shift operator on the canonical probability 
space associated to the fBm. Then we shall prove the consistency of the estimator: 

h in Probability ^ 

n ' n— >oo,h-¥Q 

under the additional assumption that the number of approximation points satisfies n = tZ 
with 7 > 1 + mi^ 2 (see Remark 2 for a discussion about the dependance between n, m and H) 
and another assumption on the non degeneracy of the stationary solution (see Hypothesis 4). 

Similar results are obtained for the locally linear estimator that we do not present in this 
introduction because it would require further and heavy notations. Nevertheless, the approach 
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is identical. We first construct a continuous time version of the estimator in Definition 2 and 
some deviation probability bounds are obtained in Theorem 7. Then a discrete version is 
proposed (see Definition 3) and the consistency is obtained in Theorem 9 which is the other 
main result of this work. 

The remainder of this paper is structured as follows. In Section 2, we give some notations 
and we state our main assumptions. Then we recall the link between B H and the fundamental 
martingale for which classical stochastic calculus is available. This allows us to introduce a 
new observable process having a semi-martingale decomposition (see (8)). Then we enounce 
the ergodic properties of the solution. The ergodic property under discrete observations as it 
is stated in Proposition 1 is new under our assumptions on the discretization procedure. Its 
proof is postponed in Appendix B. Section 3 is devoted to the Nadaraya- Watson estimator 
of the drift whereas the study of the locally-linear estimator is done in Section 4. In the two 
sections cited above, we state deviation probability bounds and consistency of the estimators 
under continuous and discrete observations. Some proofs related to the locally-linear estimator 
are gathered in Section 5. Finally we shall make use of a Fernique's type lemma that is stated 
and proved in Appendix A. 

2. Preliminaries 

We consider a complete probability space (fi, J 7 , P) on which a one dimensional fractional 
Brownian motion B H is defined. We denote Ft = a(Bj? , s < t) the a— field generated by B H 
completed with respect to P. 

In a first subsection, we give some notations and we state our assumptions. Then we indicate 
how to associate to the observed process an auxiliary semi-martingale which is appropriate 
for the statistical analysis. For this, we will resume the notations of [14, 22]. Thereafter, the 
ergodic properties of the stochastic differential equation (1) will be discussed under ad- hoc 
assumption on the drift. 

2.1. Notations and assumptions 

In all the sequel, we use the following notation. If / and g are two functions form R to R, we 
write f{t) y git) when there exists a constant K such that fit) /git) > K. When the ratio of 
/ and g is constant, we write f(t) x g(t)- 

The drift b may satisfy one or several items of the following hypothesis. 

Hypothesis 1. 

l.a) (Local regularity) For any x, b is locally Holder of order b in the point x: there exists L x 
such that 

\b{y)-b{y')\<L x \y-y'\ 

for any y,y' in a neighborhood of x. 
l.b) (Global regularity) The drift b is continuously differ entiable with a polynomial growth 
condition on its derivative and on b itself: there exists q, > and m G N such that 

|6(z)| + |6'(x)| <c b (l + |x| m ) , xGR. 
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l.c) (One-sided dissipative Lipschitz condition) There exists a constant L > such that for 

(b{y) - b{y')) X (y - y') <-L\y- y'\ 2 , y,y' € R. 

We remark that the one-sided dissipative Lipschitz condition implies that b is Lipschitz 
with the same constant L. It has been proved in [21] that there exists a unique solution to 
Equation (1) under Hypotheses l.b) and l.c). 

The kernel function that we shall need satisfies the following usual properties. 

Hypothesis 2. 

The kernel function N is continuously differentiate, nonnegative with support in [—1,1]. With- 
out loss of generality, we may assume that it is bounded by 1. 

Hypothesis 2 is supposed to be fulfilled in all the rest of this paper. 

We also need the following notations concerning the discretization of the time interval [0, T\. 
Hypothesis 3. 

For a given neN, a time discretization {tk}o<k<n is considered with equally spaced observa- 
tion times e n := tk+i — tfc x n~° with a G (0,1). 

We observe that the number of approximation points n is related to the time horizon of the 
discrete observations t n by n x tZ with 7 = 1/(1 — a) > 1. 

We remark that e n x n - ^ 7-1 ^ 7 . The forthcoming discussions will be held by means of 7 
instead of because they lead to more readable expressions. 

2.2. Preliminaries on fractional Brownian motion 

It is difficult to work directly with the fBm B H because it is not a semimartingale. Hence we 
introduce some related processes that will have nicer properties. For this purpose, let wh be 
the function defined by 

w H (t,s) = c H s l l 2 - H {t-s)^- H l m {s) (4) 

where c H = (2HT(3/2 - H)T(H + 1/2)) -1 . Thanks to [14, 22], the process M H = {M t H ) t > 
defined by 

M t H = [ w H (t,s)dB^ (5) 



J 

is a centered gaussian process with independent increments. Its variance function is given by 

-rp ((]>,fH\2\ _ T(3/2-H) ,2-2// % .2-2/7 

^ \\ M t ) j - 2//r(3-2//)r(//+l/2) 1 ' H 

Thus (M^)t>o is a martingale (called the fundamental martingale in [14]). The natural filtra- 
tion of the martingale M H coincides with the natural filtration of the fBm B H . Finally, the 
process B = (Bt)t>o defined by 

B t = - 1 = f s H - l ' 2 dM? 
^\ H {2 - 2H) Jo 
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is a standard Brownian motion that generates the same filtration as B H and M H . The inverse 
relationship will also be helpful: 



Mi 



f = {\ H {2-2H)) 1 ' 2 [ s^dBs. (6) 
Jo 

We introduce the observable process Y = (Yt)t>o defined by 

Y t = x + [ w H (t,s)dX s (7) 
Jo 

= x + w H (t,s)b(X s )ds+ / w H (t,s)dB^ . 
Jo Jo 

By (5) and (6) we have the following alternative expressions: 

Y t = x + [ w H (t,s)b(X s )ds + M t H 
Jo 

= x + [ w H (t,s)b(X s )d S + (X H (2-2H)) 1 / 2 [ sV^dBs. (8) 
Jo Jo 

In order to use the martingale M H , we remark that 

WHMds = {2 _ C 2 H H)XH (t - s)^- H s H -^d(M H ) s , 



thus if we let 



we may write 



Y t = x + f w H (t, s)b(X s )d(M H ) s + Mf . (9) 
Jo 

The above representations will be the starting point of the construction of our estimators. 
2.3. Ergodic properties of the stochastic differential equation 

In this subsection, we give details on the ergodic properties of the fractional SDE (1). We use 
the results of Section 4 in [7], and we borrow the presentation of [21]. However we repeat it 
for conciseness and we give some precisions. 

Without loss of generality, we work on the canonical probability space (Q, J 7 , P) associated 
to a fBm B H := (B^)t^u defined on R entirely. This means that B H is a zero mean Gaussian 
process having the variance function equals to E(|5^ — BjJ~\ 2 ) = \t — s\ 2H for any s,t G R. The 
Wiener space is the topological space Co(R;R) equipped with the compact open topology 
and J- is the associated Borel a— algebra. The measure P is the distribution of the fBm B H 
which now corresponds to the evaluation process Bf (pj) = u(t) for t e R. The law of the 
two-sided fBm is invariant to the shift operators with increment t G R. In other word, the 
operator &t defined from Q to Q by ^cj(-) = w(- + t) — uj(-) is such that the shifted process 
(B s (6f)) se R is again a fBm. 
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Moreover for all integrable real valued random variable F it holds 

i r T 

lim — / F(6 t (uj))dt = E(F) P- almost surely. (10) 
t^oo T J 

The ergodic properties of (1) will hold under the assumption that the drift b satisfies 
the polynomial growth condition l.b) and the one-sided dissipative Lipschtitz condition l.c). 
Under these hypotheses, there exists a random variable X with finite moments of any order, 
and such that 

]im\X t (w)-X(e t (u>))\ = (11) 

for P— almost-all u) G f2. Thus the solution of Equation (1) converges when t goes to infinity 
to a stationary and ergodic process (Xt)t>o defined by Xt(u>) = X(8t(co)). By [9, 10] the law 
of (Xt)t>o coincides with the attracting invariant measure for the solution of (1). The next 
proposition will be crucial when we will study consistency of our estimators. 

Proposition 1. Assume that Hypotheses l.b) and l.c) are true. Consider a continuously 
differentiable function cp such that 

\<p(v)\ + \<f/(y)\ < <v(i + |y| p ) , yeR, (12) 

for some c v > and p E N. 
l.i) We have 

lim i / <p(X s )ds = E(<p(X)) P-a.s. (13) 

l.ii) 7/7 > 1 + (m 2 + p)H and 7 > p + 1 then 

1 f tn ( 1 
^T n h {^=oV(^ fc )l[ tfc ,* fe+1 )(^)}^ = EMX)) P-a.s., (14) 

where the observation times are defined in Hypothesis 3. 

Let us make the following remarks and comments about the above proposition. 

Remark 1. The proof of this result is partially contained in Proposition 2.3 and Lemma 3.1 
in [21 J. But in our result, we have a condition on the number of approximation points that 
depends on H and on the degrees of polynomial growth m and p. We think that it is not possible 
to ret rid of the fact that n^tZ, with 7 > 1 + max ((tn 2 + p)H , p) . 

A proof of this result is proposed in Appendix B. 

Remark 2. The above result is valid for a wide class of drift function since m G {0, 1, 2, ...}. 
We cover the case of bounded function as well than the case of linear and polynomial growing 
functions. Such a remark is valid for the function ip. 

Assume that ip is bounded together wit hits derivative (p = 0). The condition on 7 becomes 
7 > 1 +m 2 H and thus the number of approximation points is related to the polynomial growth 
order m and to the Hurst parameter H . When m is fixed, we need more points in the time 
discretization when H grows. This is intuitively correct since the trajectories becomes more 
regular when H increases. Thus the process is less oscillating and it is necessary to observe 
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more often the diffusion in order to insure that X visits very often any neighborhood of any 
fixed point. 

When H is fixed, the number of approximation points is growing with the polynomial growth 
coefficients m and p. It is surely related to the speed of convergence of the process X to the 
stationary ergodic process X. It is intuitive to think that when we let the drift coefficient 
behaves like of polynomial function, the convergence must be slower when the degree is big. To 
our knowledge, such investigation has not yet been carried out. 

3. The Nadaraya- Watson type estimator 

Our first method for estimating the value of the drift b in a fixed point x £ R is inspired of the 
Nadaraya- Watson kernel regression. We construct this estimator in the following subsection. 
Thereafter some deviation probability bounds are given and finally, the consistency will be 
stated under the ergodicity assumption. 

3.1. Construction and decomposition of the Nadaraya-watson etimator 

First of all, we assume that the whole trajectory (Xt)o<t<T is observed between the times 
and T. We will discuss a discretized version of our estimator in a moment. 

The construction of a Nadaraya- Watson estimator is based on a simple idea. First we think 
that the drift b is a constant function, that is b(x) = for any x. Hence an estimator of 9 is 
an estimator of b(x). We denote 



J o 

The unknown parameter can be estimated by the least squares method (see for example 
[17]). The least squares estimator of obtained at time t is given by 




Similarly to (9), we introduce the observable process Y e = (Yf)t>o 




8(t) 



fQW H (t,s)dY s l 



ti(w H (t,s)) 2 d(M»} s 



We denote 



OtH = 



^/\ H {2 - 2H) 



(15) 



and we remark that 




Thus we obtain the alternative representation of 0(t): 
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In the context of our fractional diffusion (1), the drift b is not constant. Hence we approximate 
it by a constant function 9 in a neighborhood [x — h, x + h] of the point x. For this purpose 
we consider a kernel function TV satisfying Hypothesis 2. The above discussion leads to the 
following definition of an estimator of b(x) by means of the observable process Y. 

Definition 1. The Nadaray a- Watson estimator of the drift b in a point x with the bandwidth 
h is defined at time t by 

(X) ~ JJoirf,-.)-"^^)* <16) 
with the convention that a/0 := 0. Equivalently, the more classical expression holds 

Using the representation (8) and the fact that N < 1, we notice that the stochastic integral 
in (16) is well defined. The integral with respect to the process X in (17) is just an alternative 
writing of the one with respect to Y in (16). Moreover, starting from (16) and using (8), (4) 
and (15) we may express our estimator as 

f^ M J^ 2 H {t-sf^N{^)b{X s )ds fia H (t-s)V*-"N(Zj=£)dB a 



Then we obtain the following decomposition of the error: 

b^(x) = b(x) + C x ,h(X t ) + r^ h {X t ) (18) 

where 

J*a H (t-s)^- H N(^)dB s 



Jl(t-s)^N(^)[b(X s )-b(x)]ds 



There are two kinds of errors in (18). The first one is a stochastic one (the term £ x ^(Xt)). The 
second one {r^\{Xt)) represents the accuracy of the local approximation of b by a constant 
function in a neighborhood of the point x. 

From a practical point of view, the real interest is the case when the observed data are 
discrete. So we provide now an effective estimation procedure. We assume that the process 
(Xt)o<t<T is observed at times (ifc)o<fc<n ( see Hypothesis 3). We discretize the expression of 
b™(x) given in (17) by Riemann sums as 

n-1 



n-1 

^n-t k r 2H N(^){t k+1 -t k ) 
k=0 
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In order to have a decomposition of the error, we consider the simple process (Q™) s >o defined 
by 

n-1 
k=0 

With the help of Equations (7) and (8) we may rewrite b^ w h (x) as 



1 



&*w() / Q n s dX s 

Jo Qs ds Jo S 

i rtn fyn 

dY„ 



P Q n Q n s ds Jo w H (t,s) 

= -r^ ( r Q n s b{X s ) ds + (\ H (2-2H)) l l 2 Q } , s l ' 2 ~ H dB^\ . 

f* n ds \Jo ^ K ' V V n Jo wh&s) J 

Thus we obtain a similar decomposition than (18) 

b™ h (x) - b(x) = MX£) + r l ° c h (X tn ) + r^(X tn ) (19) 



with 

£x,h(Xt n , 



Jt a H (t - s)^ 1 ' 2 Q n s dB s 
Jl n o? H Q n s ds 

It E n k =o(t n - t k Y~ 2H N{^) (b(X tk ) - b(x))l [tk , k+l) (s) ds 

ft Qs ds 

it U=o(tn - t k y- 2H N(^) (b(X s ) - b(X tk ))l [tkM (s) ds 



itQs ds 



We remark that Hf/j^O) represents again the accuracy of the local approximation of b by a 
constant function in a neighborhood of the point x, but only in the discrete times (tfc)o<fc<n- 
It is worth to notice that a new term is involved: r JV(Xt n ). It represents the error made when 
one proceed to the discretization of the continuous process (X s ) s >q. 

In the next subsection, we study deviation probability bounds for b^ w (x) and b§™(x). 

3.2. Deviation probability 

In order to study the error from a probabilistic point of view, we need to introduce for some 
p > and (3 > the random sets 

AT = {Ji *H<t - s)^ 2H N (^p) ds > p t 1 ^} and 

a™ = {it <* H n=itn - t^N^y^^ds > p ti- H +e} . 

Some properties of the Nadaraya- Watson estimator are stated in the following theorem con- 
ditionally on the above events. 
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Theorem 2. Under Hypothesis l.a), when the trajectory is continuously observed, we have 
for any Q > 0: 

P (\b^(x) - b(x)\ > L x h b + (, AT) < 2ex P (V (1 " H ) C 2 t W ) • (20) 

We assume that b satisfies Hpotheses l.b) and l.c). There exists tq > 1, c Xt h,L > 1 and a 
constant c# j7 such that for any t n > tq, the following conditional deviation probability bound 
holds: 

p(\b^ h (x)-b(x)\>Lh + c xAL e n + (, A™ h ) 

g(7-l) / 2H(-y-l) \ 

^ 2eX p(" /Li i^^)+ C ^^ 7+1 ^[-iztn^ J • (21) 

It is interesting that the above results about the quality of our estimation are non-asymptotic 
and do not require any ergodic or mixing properties of the observed process. Clearly the event 
A% W fo is completely determined by the observed values of the trajectory of X. It is therefore 
always possible to check wether the path belongs or not to this set. If it is not the case, we 
are not able to guarantee a reasonable quality for the estimation of b{x). 

In the following remark, we discuss the rate of our approximation. 

Remark 3. 

1. If we choose a time dependent bandwidth ht such that h\ X L~ 1 t~^ //2 ) then the rate of 
estimation is of order t~^l 2 : 

P (\b^ t (x) - b(x)\ t t-P'\ A^ t )<2exp(-p 2 (l-H)tP). 

2. In the discrete case, for a fixed (3 > 0, we consider: 

• h n a time dependant bandwidth with h n >c L~ 1 t n ^^ 2 ; 

• 7 =(4ff + /3)/(4£T-/3); 

• e n = t n /n x t~^~ l) with 7 - 1 = 20/(4£T - P) > (3/2. 

Then the approximation rate is again of order t n ^^ 2 since (21) implies 

P(fcW " b(x)\ h t^ 12 , AZ, hn ) 1 exp {-C p , L , H t() +C exp (-g) , 
with Hi, [J>2 > 0. 

The stochastic integral that appears in the expression of £ x ,h(Xt) is a fractional martingale 
(so called in [12]). In order to study the asymptotic behavior of the Nadaraya- Watson estima- 
tors, we need asymptotic properties of this fractional martingale. This will be done thanks to 
a straightforward exponential inequality for this kind of stochastic integral. One refers to [28] 
for related results on exponential inequalities for fractional martingales. 

Lemma 3. We consider K = (K s ) s >o, an adapted process such that for a positive function v 

sup (t- sf- 2H \K s \ 2 ds < v(t) . 

0<u<t JO 

1 See (32), (29) in the proof for an explicit expression of to and c x ,h,L 
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Then for any £ > it holds that 

Proof. For a fixed time t, we consider the true martingale (Z*)o< w <t defined by 



12 



(22) 



Here t is consider as a fixed parameter for the martingale Z . It holds that 

(Z\= Ht-s^lKsfds^vit) . 
Jo 

The classical exponential inequality (see [27] Exercice 3.16, Chapter 4) implies the result. □ 

Now we can prove Theorem 2. 
Proof. The proof is divided in several steps. 

Step 1: proof of (20) 

We use the decomposition (18). Obviously we have the following estimation 



„loc I 



l^hl < L x h . 

Thanks to the exponential inequality (22) of Lemma 3 we have for any £ > 

P^fcTOI > C , < P (j\t - s )^- H N{^)dB s > p C 

<2exp (- P 2 {l-H) ( 2 t 2 ? 

By (18), (23) and (24) we obtain 



(23) 



(24) 



P[\^(x)-b(x)\ > L x h b + C, A^)<P[\^, h (X t )\ + \r 1 °UX t )\>L x h b + (, Al 



|NW 

:,h 



<p(M^I>C, A™ 

< 2exp (-p 2 (1 - H) ( 2 t 2/3 



and (20) is proved. 
Step 2: proof of (21) 

We analyse separately the three terms in the decomposition (19). We begin with (, x ,h(Xt„ 
and we write 



P(IC,*(*OI>C, A% 



< P 



Jj" a H (t - s) H -V 2 Q n s dB s \>C jj" a\ Q n s ds , A™ 
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We fix t n and we consider the martingale Z n : = (Z")oo<t n defined by 

Z? = f a H {t n -s) H - l ' 2 Q n s dB s . 
Jo 

Since < N < 1, the quadratic variation of the martingale Z n satisfies: 

(Z n )r = a\ £ (t n - sf H ^ |X> - t k )^ H N\^)l [tk , k+l) {s)^ ds 
n— 1 ~t 

< 0?H E / (*» - S^- 1 ^ ~ t k f^ H l [tkM { S ) ds . 
k=0 J ° 

When H > 1/2, it holds (i n - s) 2 ^- 1 < (t n - t k ) 2H ~ l for t k < s < t k+1 . Hence 

(Z n ) tn <a 2 H J2 (*» - ^"^IfeA+oto * 
< c& / (f n - s) 1 " 2 " (is 



JO 

2 

< a H t 2-2H 

~ 2-2H n 

For the second case when H < 1/2, the inequality t n — s > t n — t k+ \ = t n — t k — A (valid for 
t k < s < ^fc+i) implies that 

(tn ~ t k f~ iH < (t n - sf~ iH (l + ^_\ 2+ ' H < 4 (tn _ s) 2-4H 

V t n -sj 



Therefore we obtain 

1 7 n\ A ^ 2 /"'" D U-2/f 2a 2 H 2-2H 

{Z ) tn < ka H (t n -s) ds= — - t n 
By Lemma 3 we conclude that 

P (\^,h(X tn )\ > C , At th ) < 2exp (- p2( \-J K2 tf) . (25) 



Now we study the error term f l ° < f l (Xt n ) from (19). The drift b is Lipschitz by Hypothesis 
l.c). So we have 

\r l ° c h (X tn )\<Lh. (26) 
The last term i~^V(Xt n ) is more difficult to handle. At first we write 

By Equation (1), 

X s - X tk = f b(X r )dr + Bf - Bf k , 
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and when \Xt k — x\ < h we may write for ty. < s < ifc+i: 

\X S -X th \< J' {\b(X r ) - b(X tk )\ + \b(X tk ) - b(x)\ + |6(x)|} + \Bf - B=\ 

< e n {L h + + |x| m )} + L jf \X r - X tk \dr + e\ \\B H \\ 0UA , 

where we have denoted for < f) < H: 

I 73 if D//I 

-D„ — D,, 



\B H %Uh = SU P 

0<r,s<t n Is-rp 



When e n is small, the Gronwall inequality implies that for any tk < s < ifc+i: 

|^-^J <e n {i/ l + £% (l + |xr)}+ e^lS^Ho,^ • (27) 



Therefore 



with 



|*3(XO| < ^,^^ + ^411^110,^,1) , (28) 



Ca;AL = L{L/i + c b (l + |x| m )} . (29) 

Now we are able to end the proof of (21). Starting from the decomposition (19), using the 
estimations (25), (26) and (28), we obtain 



>(\b™ h (x)-b(x)\ > Lh + C xAL e n + (, A™ h 

< J>(\H x , h (Xt n )\ + \ri c h (X tn )\ + \r^(X tn )\ >Lh + C M e n + ( , A% h 
<P^ x MXtJ\+Lh + C xAL e n + Le^\\B H \\o,t nM >Lh + C M e n + C, A 

< p(|^ A (X t J| > C/2 , A% h )+P(L<*\\B H \\ 0)tnA > C/2 



NW 
tn,h 



<2e, P (-^^ tf) +p(\\B H \\ , tnA ><) . (30) 



We treat the last term in the right hand side of the above inequality. We need a Fernique's 
type lemma for the exponential moment of the Holder norm of the trajectories of the fBm B H . 
Such a result is stated in Lemma 10 in the Appendix A. Chebishev's exponential inequality 
yields 

P( r ||£ JT ||o 1 t ni h > rfr) < ex P f-rfr) E ( ex P (\\B H \\o. 



l|vJ,tn;f) 

< c HA (1 + i*-*) exp (v** - ^) , (31) 

where we have used (59) from Lemma 10. We recall that e n x n"^ 7 " 1 ^ 7 where n is the number 
of approximation points satisfying n = tZ with 7 > 0. We may write (31) as 

P(ll^llo,t„,^ > ifnr) < est (l + t*"*) exp (-4- L tl^ (l - ^tT^tlM 
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If we choose f) such that H > t) > 2H/ (7 + 1), then 2H — f) — 7(5 < 0. For simplicity we fix 

_H H ff( 7 + 3) 
00 2 7 + 1 2(7 + 1) • 

When t n is large, more precisely: 

tn > r := ( — — J V 1 , (32) 

we have 

( 1 _ 256 H 2 L t 2(H-t) ) ^0(1-7) A > I 

V Cl)o 2 n n / - 2 

and (31) yields 

(II^IIoa^o > ^) <2c HM t»- t >° e X p(-^4 o(7 - 1) ) • (33) 



P 

With 7 > 1 we have 



g( 7 -l)( 7 + 3) 2ff( 7 -l) 

M7 " 1)= 2(7 + d -^n~- 

We report (33) in (30) and we deduce (21). 

The proof of Theorem 2 is now complete. □ 

In the end, we get rid of the conditional result in the next subsection. 
3. 3. Consistency of the Nadaraya- Watson estimators 

We start the investigation of the consistency of our estimators by the following proposition 
that is anecdotal but interesting in itself. 

The strong consistency of b^(x) is naturally related to the almost-sure convergence to 
°f (,x,h(X t ) (see the decomposition (18)). If H = 1/2, by the strong law of large numbers 

for martingales, this convergence will holds as soon as J °° N 2 ( Xtk h x ^ds = +00 almost-surely. 
When H > 1/2, such a condition will also ensure us the convergence of the fractional stochastic 
term ^ Xl h(X t ). 

Proposition 4. Under the Hypothesis l.a), when H < 1/2 and 



POD 

/ N 2 (^)ds = +00 P 
Jo 



a.s. (34) 



then the Nadaraya- Watson estimator is strongly consistent: 

TO*) : P T > b W ■ ( 35 ) 

Of course when the bandwidth is time dependent with limt_ ! . 0O hi = 0, we have lim^oo b®™ (x) 
b(x) almost-surely. The proof Proposition 4 is based on the following fractional version of the 
integral Toeplitz lemma. 
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Lemma 5. Let a > 0. Let (xt)t>o be a continuous real function such that rini£_ >00 x% = x and 
let (7t)t>o be a measurable, positive and bounded function. Then it holds that 



\a— 1 



Uolrdr) x s ds 



m-*)"- 1 (locals 



t— s>oo 



provided that Hindoo Jq 7 s ds = +oo. 

Proof. Let e > and ^4 be such that |cc s — x\ < e for s > A. We denote = sup s<7 4 \x s — x\. 
By Fubini's theorem 



(t - s) a j s ds = a (t- s) 



\a-l 



'jrdr ) ds 



and we write for t > A 

rt 



Jo (t - s) a 1 (/J -f r dr) x s ds 



ffc - s)«-i (J* lr dr) ds 



< 



J * (t - s) a 1 (J S 7 r dr) |x 8 - x\ds 
W^sj^ (J»lrdr)ds 

X " 1 {Jo "frdr^J ds 



fn (* " s 



Another application of Fubini's theorem implies that 

Jo 4 " s^ds^j lr dr ^ jA ^ ^ _ r y _r t _ A ^ dr 

So (Sr (* " s)"- 1 ^) 7rdr ~ /„'(* " 0°7rdr 



(36) 



< 



< 



< 



Jo {t-r) a j r dr 
f Q (t-r)<* lr dr 
A t a (swp s > \-y s \) 
Sl'\t-r) a lrdr 
A t a (sup^> 1 7, |) 
t a J t/2 lrdr 



and the last term tends to as t — > oo. We report this convergence in (36) and we obtain the 
result. □ 

Now we prove (35). 

Proof. By (18) and (23) we have 

\b^(x)-b(x)\<L x h b + \^ h (X t )\ . 

Let a = 1/2 — H > 0. By the stochastic Fubini theorem 



f (t _ 8 )°>N(&j=2)dB 8 = a I (t - s)^ 1 f /" 
Jo io V-/0 



A/ 



Hd-Br. ds 
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and we write ^ h (X t ) = T* h {X t ) x T 2 h (X t ) with 



T X,h( X t 



and 



Ra H (t-8)°-i {j°N 2 {^)dr)ds 



St(t-8)**N{Zj=£)d8 m 
Since L°° N 2 ( X "Z X ) ds = +00 almost-surely, 

and the generalized Toeplitz lemma 5 yields that lim^^ T^, h (X t ) = almost-surely. Now we 
prove that the second term T 2 h {Xt) is bounded when t is large. Since the kernel function N 
satisfies < N 2 < N, we have 



< 



f*(t-s) 2 «N(^)ds 

(ti(t-s?«N(^) d s) 1/2 (jiN(^)ds) 1/2 
ti(t- S y"N(^)ds 



where we have used the Cauchy-Schwartz inequality. For t big enough in such a way that 

Jo 1 N (^ £ ) ds - 2 we ma y write 



2 



\Ti h {x t )\ < 



< 



fi-\t-8)*»N{*j=z)da 



r,N{^)ds 
1 

< 1 + - . 

2 

Therefore, lim^oo £ x ,h(Xt) = almost-surely and the proof is completed. □ 

Remark 4. Let = (M t (a) ) t >o with a = 1/2 — H > be the fractional martingale (as 

so called in [12]) defined by = J^(t — s) a N( Xs j^ x )dB s . We have seen in the previous 

proof that (34) insures us that the fractional martingale satisfies the strong law of large 

numbers: 

Mt ] P-o... n 
-)• . 



>-oo 



with a "fractional bracket" defined by -< y t = J *(* - s) 2a N 2 ( X ^-)ds. This is to our 

knowledge the first result of asymptotic behavior for fractional martingales. Unfortunately the 
technics we employed to prove this convergence are not adapted to prove a similar result for a 
fractional martingale with a < 0. See also [28] for further discussions on this topic. 
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Now we will work under the one-sided dissipative Lipschitz condition that ensures the 
ergodic properties of the observation process X. Before stating the main result of this paper, 
we make the following non-degeneracy assumption of the stationary solution. 

Hypothesis 4. The law of X is non-degenerate in a neighborhood of x: for any small band- 
width h it holds 



E 



N 



X-x 



> . 



h 

Remark 5. It seems important to understand when the law of the stationary solution is 
non-degenerate. This will be certainly the subject of future works. It is obviously true if the 
distribution of X has full support. Nevertheless, the above hypothesis is satisfied in the case of 
the ergodic fractional Ornstein- Uhlenbeck process. 

Theorem 6. We assume that Hypotheses l.b) and l.c) hold true. 

When the whole trajectory is observed, the Nadaraya-Watson estimator is consistent: 

i NW/ . . . . almost-surely when H < 1/2; 

KT(x) >Kx), < ■ ...... . tt 1 /n (37) 

l ' ri t->oo,h->$ y in probability when H > 1/2. 

Its discretized version is also consistent when we assume that the number of approximation 
points satisfies n x tZ with 7 > 1 + m 2 H: 

f NW / \ in probability , . , . 

b t„,hi x ) Kx) ■ (38) 

We observe that the number of approximation points depends on the regularity of b in the 
above result. This has already been discussed in Remark 2. 

Proof. In the following arguments, we will make use of Proposition 1 with tp = N or tp = N 2 . 

Obviously, (12) is satisfied with p = 0. 

Step 1: proof of (37) 

When H < 1/2, by (13) we obtain 



By Cauchy- Schwartz inequality and Hypothesis 4 

1/2 



E 



N 



X-x 



> E 



N 



X-x 



> 



and (34) is satisfied. Thereby (37) is a consequence of (35) from Proposition 4. 

Now H > 1/2 and let e > 0. We use the probability deviation bound (20) with £ = t~$l 2 . 
For t large enough and small h we have 



)<P[\b^(x)-b(x)\> Lh b + t~^ 2 

< p - b(x)\ > Lh b + 1-^ 2 , AT) + p (nVW 



<2exp(-p 2 (l-#)^)+P(tAA7)- 
The consistency (37) will be proved as soon as 

P (0\^7) = P (jj a H (t - s)'- 2H N (^s) ds < p t 1 -^) — h j . (39) 
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Since (t - s) 1 - 2 * > t l - 2H , 

C {fiN(±j=*) ds<^ t H ^} C {I jjtf (^) <fe < £ t^" 1 } := A,, • 



By the ergodic result (13), 



> 



thus \j± t h tends to almost-surely when we choose (3 such that H + /3 — 1 < 0. This implies 
(39) and (37) is proved. 

Step 2: consistency under discrete observations 

When H > 1/2, the proof is identical to the above one. We use the probability deviation 
bound (21) instead of (20) and the discrete ergodic property (14) is invoked in place of (13). 
When H < 1/2, we use the deviation bound (21) and it remains to prove that 

p (n\AZ h ) = p (Vx> - t k r™N(^) < P — — > o . 

V k=0 ) 

Let m be such that i m _i < < t m . Since si—)- (t n — s) 1-2 ^ is a decreasing function, 
(*n - tk) l ~ 2H > {t n ~ if) 1 ' 211 = ( t -f) 1 ~ 2H for k < m - 1 and we deduce 

ft n—l 
J0 k=0 

r t m—1 



fc=0 

> 



J fc=0 

aft) 1 - 2 "/" E~(^)W,)(»)*. 



where = t k for k < m — 1 and i m = t n /2. We notice that the mesh size of this new time 
discretization is less that e n . The observation times are no more equally spaced but it is easy 
to convince ourselves that it does not affect the results of Proposition 1. Thereby 



„ hk m—1 

«vc* c (%r™ / 2 £ * (^) < ^ ^ 

I - 70 fc=o 

{ 1 m—1 
I'm JO ,_ n 



H+/3-1 



and accordingly of the inequality (14) from Proposition 1, we may conclude as in the first 
step. □ 
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4. The locally linear estimate 

We follow the same structure than Section 3. Nevertheless, since the notations become more 
heavy, we distinguish the case of continuous observations from the discrete one. 

4-1. Heuristic approach 

The idea is similar to the one used Section 3 and also follows the one developed in [29] . At first 
we discuss the case of a linear drift coefficient b of the form bg g 1 (z) = Oq + 6\(z — x) jh. Hence 
b depends on two parameters 9q an( l $1 {h > is fixed). Since bg (h g 1 (x) = 9q, an estimator of 
#0 is an estimator of the value of the drift at the point x. We denote 

Xf = x + f ' bg Qfil (X e s )ds + Bf . 
Jo 

Similarly to (9), we introduce the observable process Y e = (Y t s )t>o defined by 

Y t e = x + f w H (t, s)bg ofil {X e s )d{M H ) s + M t H 
Jo 

= x + f\u( 6 e \)d{M H ) s + M^ 
with p = (p s )s>o is the process with values in R 2 defined by 

* = {Xl-x)/h )' 

and for a matrix A, A T denotes its transpose. Intuitively, the values 9q and 9± can be estimated 
by the least squares method (see for example [17]). If the 2x2— matrix 

n t = f PsP ^d(M H ) s 

Jo 

is not singular, the least squares estimator of (Qq, #i) T obtained at time t is given by 
With the constant an defined in (15), we may write 

- J a ^ - s) \ {xl - x )/h (xl - xf/h 2 ) ds 

and we obtain the following expression of 9o(t): 

0o(t) = ^ jf * B (t, s)dY? - ^ jf* u, H (t, s) (2j=*)dY* , 
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where for i E {0, 1,2}: 

m i (t) = j\l{t-sf- w (^yds 

and 

5{t) = m {t)m 2 {t) - m\(t). 

In the context of the fractional diffusion (1), the drift b is not linear. Hence we approximate 
it by a linear function 0q + 0\{z — x)/h in a neighborhood [x — h, x + h] of the point x. For 
this purpose we use a kernel function N that satisfies Hypothesis 2. The above discussion 
is the starting point of the construction of a locally linear estimator of b under continuous 
observations. 



4-2. Observations based on the whole trajectory 

4-2.1. Construction and decomposition of the error 

We give the following definition of a locally linear estimator of b(x) by means of the observable 
processes Y and X. 

Definition 2. The locally linear estimators at time t ofb{x) with the kernel N and a bandwidth 
h is defined by 



v 2 (t) 



w H (t,s)N(^)dY 



o 



vi(*) 



d(t) 

r v 2 (t) Vl(f) 



X s —x\ at-/ X s — x' 



w H (t,s)(^)N 







d(t) d(t) 

where for j = 0, 1, 2: 



d(t) Jo 

(^7F)1 al(t-s)^ H N(^)dX s , 



dY s 



fj (t) = fal{t-sr™(^) j N{^)ds 
Jo 



(40) 
(41) 

(42) 

d(i) =v (t)v 2 (t)-( Vl (t)) 2 . 

The alternative expression (41) is obtained thanks to the definition of the process Y given 
in (7) and the relation 

wnit, s)wH(t, s) = a 2 H (t — s) l ~ 2H for t > s . 

Moreover, the representation (8), the facts that N < 1 and that for all z G R, |zA^"(z)| < 1, 
we notice that the stochastic integrals in (40) are well defined. Moreover we remark that 
d(t) > by the Cauchy-Schwartz inequality. 

In order to understand what kind of quantities will appear in the deviation probability 
bound, we write a decomposition of the error b^ h (x) — b(x). Using (8) and (40) we rewrite 

b l l h {x) as 

v 2 (t) 



d(t) 

vi (t) 
d(t) 



+ 



[ aH {t-s) 1 l 2 - H N{^)dB s 
Jo 

a H (t- S y/ 2 - H (^)N(^)dB s 

l{t-sf-™N{^)b(X s )ds. 



r v 2 (t) vi(t) , x . 



d(t) d(t) 
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Now we consider the local error functions 6 Xi h defined by 

S x>h (z) = b(z) - (b(x) + b'(x) x(z- x)). 
By the definitions of the functions Vj it holds that 
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JO 



Now for j = 0, 1 we denote 

, J (t,s) = a H (t-s)^- H (^YN(^) 

and (Vt)t>o is the process with values in R 2x2 defined by 



V, 



vo(i) vi(t) 
vi (t) v 2 (i) 



(43) 



Thus we have again the expression 

b l l h {x) = b(x) + e x , h (Xt) + r^ h (X t ) (44) 
where h (X t ) and r l ° c h {X t ) are the first components of the following two dimensional vectors 



;h(Xt) = (V t ) 



<^) = (v t r/ ( : u )7:( ) 5 x , h (x s )d S . 



-l /"* / v Q {t,s) 

Jo \ Mt> s ) 

io V ^(M) 

The interpretation of ^(-Xt) and and is the same one than in the Nadaraya- Watson 

procedure. It is important to notice that when H = 1/2 we obtain the same decomposition of 
b l l h {x) - b{x) as the one in [29, Eq. (5.3)]. 

4-2.2. Deviation probability and consistency 

In view of the term r^ c h {Xt) in (44), the accuracy of the locally linear estimate will be expressed 
thanks to the quality of the approximation of b by a linear function. Under Hypothesis l.a) it 
is natural to introduce in the neighborhood [x — h,x + h] of the point x the quantity 

Ax./i = sup b{z) — (b(x) + b'(x) x (z — x)) 

\z—x\<h 

In order to study the error from a probabilistic point of view (see £; h {Xt) in (44)), we make 
the following comments. If the kernel function N satisfies iV 2 = N and if H = 1/2, the 
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process V = (Vt)t>o defined in (43) is the quadratic variation process of the two-dimensional 
martingale M = (Mt)t>o defined by 



If we investigate the strong consistency of our estimator we shall use a strong law of large 
numbers for multivariate martingales (see [17, 15. 5, 32]). Therefore the strong consistency is a 
consequence of asymptotic properties as t goes to infinity of the eigenvalues of the matrix Vj. 
In the fractional framework this kind of asymptotic has not yet been studied. Nevertheless, 
the eigenvalues of Vt play a crucial role in the following. 

Thus we introduce for some p > and (5 > the random set 

Al h = {X m (t)> P t^}, 

where X m (t) is the smallest eigenvalue of the matrix Vj. The properties of b^ h (x) are first 
studied restricted to the event Ar} h - The consistency is proved under Hypotheses l.b) and l.c) 
insuring the ergodicity of (1) and under the following non degeneracy condition on the law of 
the stationary solution. 

Hypothesis 5. The law of X is (strongly) non- degenerate in a neighborhood of x: for any 
small bandwidth h it holds: 



E 



: \ 2 \j I 



h ) V h 



> . 



Notice that we use the terminology "strongly non-degenerate" because Hypothesis 5 implies 
Hypothesis 4. 

Theorem 7. Let x be fixed. 

7.i) If b satisfies Hypothesis l.a), then for any £ > 0, we have 

P (|SJ fc (x) - b(x)\ > c p , H A x , h t 1 -*-? + C , A\h) < 4exp (- { -^f C 2 t^) , (45) 

with Cp^u = \/2 c 2 H /(p\ H ). 
7. it) When we assume Hypotheses l.b) and l.c) and 5, the fractional diffusion is then ergodic 
and we have the consistency of the locally linear estimator: 

gU inprvboMi^ ^ 

' t— >oo,/i— >0 

The proof of this result is postponed in Section 5. 
If we consider that Q = t~^/ 2 , (45) implies that 



- b(x)\ > c p , H A Xth t'- H -e + , A» h ) < 4exp (-fi^ # 



The quality of the approximation of b by a linear function is measured by in a vicinity 
of x. Under Hypothesis l.a), we have A Xj /j < 2L x h. Assume also that b is twice differentiable 
in a neighborhood of x with second derivative bounded by L x , then < L x h 2 /2. Now we 
are able to choose a time dependent bandwidth ht- Clearly if h\ x L x 1 t H ^ 1+ ^^ 2 (remind that 
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the symbol x means that the ratio of the functions are bounded), we obtain that the rate of 
estimation is of order t~@/ 2 : 

P (\bUx) - b(x)\ > c p ,H rW , A\\h) < 4ex P (" i ^f £! , 

where c p ^h = c Pj h/2 + 1 and of course (3 has been chosen such that f3 < 2(H — 1). 

Since b is unknown, we have no reason to have informations about L x . As usual in such 
a nonparametric context we have two choices. On the one hand we can restrict our problem 
to a class of drift function b satisfying the above hypotheses with constants L x such that 
L x £ (-^miin-^max)- On the other hand, an adaptative (data-driven) choice of the bandwidth 
may be considered (see [29] and the references therein). Unfortunately, the analysis of the 
error given in Theorem 2 seems to be not adapted to this powerful method of bandwidth's 
choice. 

To continue the comparison with the work of Spokoiny (see [29]), we may relate our random 
set and the one that appears in [29]. Indeed, very simple calculations allow us to write the 
exact expression of the smallest eigenvalue of the matrix Vj as 

Am(*) = \ (Mt) + v 2 (t) - ((v (t) + v 2 (i)) 2 - 4 d(t)) 1/2 ) . (46) 

The above expression employs analogous quantities that the one appearing in the random set 
Ah (see [29, Page 819]). Despite these analogies, it is not easy to compare the two events. 
Moreover the discussion about the accuracy of the approximation and the "stochastic error" 
is different from the one made in [29] . This is due to the fact that the stochastic error is hidden 
in the random set A^ h whereas it appears explicitly as a "conditional variance" in the work 
of Spokoiny. 

Now we give an effective way to estimate the drift when we consider discrete observations. 
4-3. Discrete observations 

We consider the discretization of the quantities that are defined in (42). For j = 0, 1, 2, 

' „t 71 — 1 

= rE^-*»r(¥)M¥)w(«)* 
* j ° k=i 

. d n (t n ) = V£(t n )v2(t n ) " K(*n)) 2 

and we denote V/ 1 the matrix 

Y n = ( vWn) V ?(*n) \ 

Considering (41), we propose the following estimator of b(x) based on discrete observations. 
Definition 3. The discretized locally linear estimator at time t n with a bandwidth h is 

a 2 H (t n -t k )^ H N(^)(X tk+1 -X tk ). 



E 



Vg(*n) _ V?(*n) (X t 

d n (t n ) d n (t n ) 



B. Saussereau/Nonparametric inference for fractional diffusion 
As in (19), We will decompose bj 1 h (x) into a sum of three terms: 

bl, h (x) - b(x) = t x , h (X tn ) + r l °UX tn ) + r^{X tn ) . 
For this purpose, we consider the simple process 



n-l 



fc=i 



v£(tn) V?(t n ) /X tfc -x 



_d"(t n ) d"(t n )V ft 

Since = f^" T™dX s , we use (7) and (8) to write that 



o w H (t n ,s) 

b(X s )T? ds + {\ H {2-2H)) 

t 



1/2 



T? 



o 



o w H (t n ,s) 

b(x 8 ) r; d s + c^ 1 / " r; (i n - s)"" 1 / 2 dB s . 







As in the case of continuous observations, 



£,x,h(Xt 



a 



T^(t n -s) H -^ 2 dB s 



o 



is the first component of the two dimensional random vector 



1 'x,h V v t n 



n \-l 



\ A 4 V-ni s ) 



where for j = 0, 1: 



n— 1 



k=l 

The last two terms in (47) come from the equality: 

r; b(x 8 ) ds = / t? (b(x s ) - b(x tk )) ds + / r; (b(x th ) - b(x)) ds 

Jo Jo 

Easy but tedious computations yield that r*^(X tn ), respectively r l °^(Xt n ), is the first 
ponent of 

-1 f ™ ^ ro traj(^n' s ) 



<»J = (vr„r 1 / 

■/ o 



respectively 



< c j^j = (vrj- 



(is 
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where we have denoted for j = 0, 1: 



ra-1 



<)(*»>*) = E^(*n " tk) l - 2H (^)°N(^) (b(X s ) - b(X tk ))l [tkA+l) (s) 
k=l 
n-1 

"LM =T. a H{tn-t k f- m (^)°N(^) 5 x>h (X tk )l [tkjtk+l) (s) . 



k=l 

Remark 6. As it has been already noticed in Nadaraya-Watson's type estimation procedure 
(see Subsection 3.1), three terms appear in the decomposition (47). Each of them have a precise 
meaning: 

• the first term, £ x ,h(Xt), is a "stochastic error term"; 

• f^xh^tk)) re P resen t s again the accuracy of the local approximation of b by a constant 
function in a neighborhood of the point x in the discrete times (tfc)o<fc<n;' 

• r x T ^(Xt n ) is the error due to the discretization of the continuous process (X s ) s >q. 

The next result establishes the probability deviation bound for the discrete locally linear 
estimator of b(x). 

Theorem 8. We assume that b is Lipschitz with Lipschitz's constant L. There exists U\, u 2 > 
such that for any t n large enough, we have the conditional deviation probability bound: 

P(\bi, h (x)-b(x)\ > CpjH A xA t 1 - H ^ + c 1 e n t 1 - H -f } + C, A™ h ) 

< 4ex P (-^tJ^ #) + C exp (-^ C 

where we have set: 



A 



11 — |A^(i n ) > pt\ ^ + ^| with j3 a positive real number such that 1 — f3 < "fH; 



t n ,h 



- A^(t n ) denotes the smallest eigenvalue of the matrix V" ; 

- c\ > depends on p, L, x, h and H; 

- C2 > depends on p, H and L. 

The proof of this result is done in Section 5. 

All the constants in (48) are known explicitly. The interested reader shall find them in 
the proof. The next theorem is one of the most important result of this work since it sets 
convergence of the discretized estimator toward the unknown value b(x). 

Theorem 9. Assume that Hypotheses l.b), l.c) and 5 hold. If moreover the number of ap- 
proximation points satisfies n X tZ with 7 > max (l + (tn 2 + 2) H; 3), then the locally linear 
estimator of b{x) is consistent: 

t n -*oo,h->-0 

Proof. Since we apply Proposition 1 with z 1— > z 2 N(z) that satisfies (12) with p = 2, the 
condition on 7 is justified. Then we just have to use (48) with 1 — < f3 < 1 — H and we 
argue as in the proof of (38) form Theorem 6. □ 
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5. Proofs 



5. 1 . Proof of Theorem 7 



We split its proof into separate steps. The starting point is the decomposition (44). We treat 
each term of h (x) — b(x) separately. 

We recall some basic facts from linear algebra. We denote for a vector z = (z\, £ R 2 , 
|| z l|oo = l^il V 1 22 1 an d || z l|2 hs Euclidian norm. For any t > 0, < A m (t) < Ajy(t) are the 
eigenvalues of the symmetric matrix V*. For y = (yi,?/2) T £ R- 2 , we denote z = (z±,Z2) T = 
(Vt)~ 1 y an d it holds 



Z no ^ 2 



12/1 1 



+ 



ml 



2 \ 1/2 



A m (i) 2 \ M (t) 



< y/2 



\m\ \ 

A m (t) " X M (t)J ' 



I 



V 



(49) 



5.1.1. Proof of 7. i) 



We study rf'^Xt). Since for any real z, < iV(z) < 1 and z N(z) < N(z), we have the 
inequality V2(i) < v o(0- By the Cauchy- Schwartz inequality we obtain 

[\%(t-s) 1 - 2H {^)N(X^)6 x , h (X s )ds <A x , h (v (t)v 2 (t)) 1/2 < A x . ft v„(t) , 

JO 



and thus 



/ j/i(t,s)4 ife (X s )(ii 
J o 



< A;,.^ v (i) . 



The relation (49) yields 

\r l °i(X t )\ < \\R X!h {X, 



t)\\oo 

t ~ 



< \l2 



Jo vo{t, s)6 x>h (X s )ds\ | / v\(t, s)5 x , h (X s )di 



X m (t) Am(*) 
< V2 A Xjh { W ) 



and consequently 



r'°UX t )\<V2A 



x,h 



x m (t) 



Since Vo(t) < {c 2 H /\}{)t 2 2H , we deduce the following bound on the random set .Aj 1 ^ 



(50) 



(51) 



with c Pi n = \/2 c 2 H /(p\ H ). 
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For the analysis of ^(Xt), we consider £ > and by (49) we may write 

p(l6UMI>c, Kh) 

<p(||H^(X t )||oo>C, A\ h \ 
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< P 



< P 



Jq v (t,s)dB s f Q u 1 (t,s)dB, 



jQU (t,s)dB s 



V 



A M (i) 
>C/V2, AK] fP 



> C/V2 , A\ 



Since Aj^f(i) > A m (i) > p £i~^+P on the random set .A^, it follows that: 



> C/V2 , J$ 



p(ICU x *)I ^ C , 4a) ^ p ( £Mt,s)dB, 

+ p 



> 4 1 1 -*- 



PC +1-H+/3 



V2 



(52) 



For j = 0, 1, |( 2 ^) i iV(^ I )| < 1. Then we may apply the exponential inequality (22) and 
we obtain 

P(l£ lh (**)l > C , Al h ) < 4exp C 2 ■ (53) 

Thanks to the decomposition (44) and the bounds (51) and (53), we deduce that 

P (\i>l h (x) - b(x)\ > c p , H A Xih t l ~ H -P + C , A\) h ] 

< P(\£ )h (X t )\ + \r^ h (X t )\ > c p , H A x>h t l - H -P + C , A\) h 

n 

f./i 



<P(|d fc M|>c, A 



< 4exp 



and the proof of (45) is now completed. 



(l-g)p 2 z-2 ,2/3 



□ 



5.1.2. Proof of 7. ii) 

We follow the same arguments that the ones used in the proof of Theorem 6. With (3 < 1 — H, 
we need to show that 



(n\A\\ h ) =p(x m (t)<pt 1 ' 



H+/3 



t— >oo 



-> 



(54) 



Since < N < 1 and z 2 N(z) < N(z) for any real z, \ m (t) > v 2 (t) by (46). Thus 

p(n\^ h )<p(v*(t)<,t 1 -*+' J ) 



< p 
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When H > 1/2, (t - s) 1 " 211 > t l ~ 2H and consequently 



Since Hypothesis 5 holds, we may apply Proposition 1. So (54) is true and the result is proved 
for H > 1/2. When H < 1/2, we use (t - s) l ~ 2H >(t- ^) 1 ~ 2H = {^) 1 ~ 2H and we write 




The concluding arguments are unchanged. 



5.2. Proof of Theorem 8 

We treat separately each term of the decomposition (47). Repeating the arguments that led 
us to (52) yields that on Af h 



p(\L,h(X tn )\ > C , Al, h ) < P (\j t Q n fJ ,°(t n ,s)dB. 



PC + i- 



> ^= t 



H+/3 



V2 



+ P 



Jj" f L 1 (t n ,s)dB 6 



We follow the arguments that led us to (25). We fix t n and for j = 0,1, we consider the 
martingales Z n ' J := (^r lJ )o<r<*„ defined by 



Z?* = / H j (t n ,s) dB s . 



Since |( s h x ) J A r ( a h x )\ < 1, the quadratic variations of the martingales Z n ' 3 satisfy 



9 2 

\6 ' J )r < l _ H l n 



Then by Lemma 3 we obtain that 



p(\L,h(X tn )\>C, Al h ) <4exp 



P 2 (l-g) C 2 + 2/J 



"2 tr 



(55) 



Now we deal with i"*V(Xt n ) an d r xh(Xt„)- For j = 0, 1 we have a discrete version of (50) 



^l oc (t n , s)ds 
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Arguing as in the proof of (51), 

\r$k(Xt n )\<c p , H A*, h %- H - f> (56) 

on the random set A 1 ^ h (we recall that c P) n = y/2 c h /(p\h))- We use (27) and we obtain 
similarly, 

t 

ro trai(*«> S ) ds 







< [c xXL e n + Lel\\B H \\ 0UM ) v$(t n ) , 
with c X: h,L = L {Lh + q,(l + |x| m )}. Finally on the set h it holds 

|r5f(XOI < U + c 2 el \\B H \\ , tn>t) ) t\~ H ^ (57) 



with c\ = c p h c X) h,L an d C2 = c Pt n L. Like in the proof of (21) (see Subsection 3.2), we combine 
the inequalities (55), (56) and (57) and we deduce: 



£ ««P (- S ^f £ + P(ll^llo,,,» > 5^^) • (58) 

By Lemma 10 we have 

P(||^|| ,t,^ > _ A-H-f, ) < CH, h (1 +^-")x 

\ Z C2 c n Ctj, / 

pxn f C_ ^( 7 -l)+H-l+/3 / 1 _ 256 c 2 ^(ff-f,) .-f,( 7 -l)-if+l-/AA 

e " I 2c 2 11 I (TP n n J J 

< c HM (1 + ^~") >< 
pxn f £_ y.&(7-l)+H-l+/8 fi _ 256 E? c 2 f -hh+D+H+l-0\ \ 

With f3 < 1 — jH one may choose f) such that 

max(i^, l±f^)<h<F. 
When t n is large enough, we obtain 

P(ll^ll,., t > 5 ^^)<2= H , lt ?-texp(-4 t t(^- 1+S ) . 

We report the above estimation in (58) and the proof is complete when we set Ui = H — f) 
and ui = f)(7 - 1) + H - 1 + /3. 

Appendix A: A Fernique's type lemma 

The exponential moments of the Holder norm of the trajectories of a fBm are classical results 
from the theory of Gaussian processes (see [6] for example) . Nevertheless we are interested in 
the large time behavior of this moment. So we prove in this appendix the following Fernique's 
type lemma in which we give precision on the time dependence of the estimation. 
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Lemma 10. Let T > 0, < I) < H < 1. We denote 

\\B ,T,ti = sup — rr— . 

o<s,t<r \t-sr 

Then for any T > Tq = (§/(8H))*'( H ~® there exists a constant cjjh such that 

Efexpdl^Ho,^)] <c HA (l + r*"*) exp(i^rW); 

The explicit form of CH,t) is given in (64). 
Proof. We denote [z] the integer part of a non negative real z. First we prove that 

|£f - Bf | < £ HA t \t-s\\ 
where ^HbT is a positive random variable such that for any p > po := [2/(H — f))] 

The double factorial of a positive integer p is defined by 

(2fc-l)!l =nt 1 (2i-l) = g|; 

. (2fc)H =nti(^)=2 fc fc!. 
We remark that when p < po, we obtain easily that 

E(^ r )<(l6f) P T^)p !!. 

In order to prove (60) and (61), we proceed as follows. With ip(u) = v?^ H ~^ and p{u) 
in Lemma 1.1 of [8], the Garsia-Rodemich-Rumsey inequality reads 



|Bf-Sf|<8/ (^J 



where the random variable 2) is 

If _ s \2H/(H-t,) 



T r T \b? - b h \ 2 k h -v 



Jo 



We have 



\ B H _ B H\ < g (42)) (ff-(,)/2 /"'* S ' # u 6- ldu 

■/ 

<8 f (4D)( // -«/ 2 |t- s |f . 
We denote ^H,t),T = 8 f (4D) (H_,,)/2 . By Jensen's inequality, for p > 2/(H - fj) it holds 

E(ew)^(^ ( ^ )/2 f)^(i / ^pWH 

<fi 6 *W*-w ^ / r E(l^-^fl 2p )^ 



< (l6f) P T p ( H -VE(\Z\P) , 
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where Z is a gaussian random variable with zero mean and unit variance. Since 

y2 Jtt (p — 1)!!, when p is odd; 

(p — 1)!!, when p is even, 



E(\Z\ 



we deduce (60) and (61). 

What remains to be shown can be tediously deduced from Theorem 1.3.2 in [6]. We can 
also make the following direct computations. Using (60), (61) and (62) we have 

E (exp {\\B H \\ 0tTA )) < E (exp 



< 



3 E(& 
! 



V 



- P °- Z> v \ + 1^ C H,t,,T , 
p=0 p=po+l 

< poll ex.p{c H ,t,,T) + 2^ %,f),T "j • 

p=0 p ' 



l/(2 k k)l when p = 2k; 

l/(2fc + l)H, when j> = 2Jfe + 1. 



where we have denoted cjj h t = 16(H /t))T H i) . We notice that 

(p-l)!!/p! = 
Since (2fc + 1)!! > n?=i 2k = 2fcfc! » we obtain 

oo c 2k oo c 2fc+l 

E (exp (IIB^Ict,*)) < Poll exp( CH , h , T ) + E W + ^ W 

fc=0 fc=0 

°°/c 2 \ fc 1 \ fc 

< p !! exp(c^ hiT ) + £ f ^ J - + ch A t E 



fc=0 v k=0 
,2 



fc! 



<p !! exp(cH ift(T ) + (1 + c H fy,T) exp(c HfjT /2) . (63) 
With 



+ ^ , (64) 



the lemma is proved because when T H ' > f)/(8iJ), cjj^t — c if h t/ 2 - '— ' 

We remark that if T do not satisfy the condition T > t)/(8H), then one may replace (59) 
by (63). 

Remark 7. Thanks to (60) and (61), we have obtained the following estimation for the mo- 
ments of the Holder norm of the trajectories of the fBm: 



E (\\ bH \%,tm) ^ ( 16 f ) TP[H ^ ] (P - !) !! ■ ( 65 ) 
/or any p > p := [2/(i? - J))]. 
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Appendix B: Proof of Proposition 1 
Proof. 

Step 1: proof of (13) 
We use the inequality 



1 

< — 
- T 



+ 



y(X t ) - v (X(6 t ))\dt 

T 



<p(X(O t ))dt - E{<p(X)) 



(66) 



Since ip has polynomial growth and X has moment of any order, <p(X) is an integrable random 
variable and (10) implies that the second term in the right hand side of (66) tends to almost- 
surely. Now we treat the first one. The inequalities 

\<p(x t ) - <p{x(o t ))\ < c v (i + \x t \* + \x{o t )\ p ) \x t - x(e t )\ 

< c^ p (1 + \X t - X{e t )\* + 2\X{8 t )\*) \X t - X{8 t )\ 



imply that 



i £ \<p(x t ) - <p(x(e t ))\ dt< c -f- £ \x t - x{e t )\* +l dt 



+ 



T 



\X{0 t )\*\X t -X{p t )\dt . 



(67) 



By (11), \Xf — X(^)| p+1 tends to almost-surely and an integral version of the Toeplitz 
lemma implies that the first term in the right hand side of (67) tends to almost-surely. For 
the second one, by the Cauchy-Schwartz inequality, 



T \ 2 

\x(9 t )\p\x t -x(e t )\dt) < 



\X(6 t )\ 2 »dt 



\X t -X{9 t )\ 2 dt 



and thus it tends to by the same arguments that we employed before. The proof of (13) is 
complete. 

Step 2: proof of (14) 
First we write 



'n-l 



with 



tn Jo 



£ = - 



E^)%A +1 )(*) \d8--E(<p(X)) 



.fc=0 



< I 1 + I 2 



t 



'n-1 



n JO 



Yl I^C***) - vMI ^Wk+oW \ ds , and 



.fc=o 



- / <p(X s )ds-B( i p(X)) 

Cn JO 
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By (13), lim 

n ^oo 1% — almost-surely. We estimate 1^ as follows. First of all we write: 

n Jo u=o J 

< C f(l+ sup \X U \A f n \^\X tk -X s \l [tkM {s)\ds . (68) 



n v 0<u<t n ' JO 



.k=0 



Since b satisfies the polynomial growth condition, it holds for t k < s < tk+i that 

\X.-X th \< [ \b(X u )\du + \Bf -B%\ 
Jt k 

< c b f (1 + \X u \ m )du + \\B H \\ 0jtnA \s - 

Jt k 

< c b ( 1 + ( sup |X u |) m ) e n + ||B H ||o, tniIl e£ . 



(69) 



Under the one-sided dissipative Lipschitz condition, Proposition 1 in [7] establishes that 

sup \X u \<c b (l + \x \ + ( sup \B^\)) 

0<u<t n \ v 0<w<i„ 7 / 

< + 1x01 + 11^11^,^) . 

with < f) < H that will be fixed later. We report the above inequality in (69): 

\X S - X tk \ < c Mo e n + c b \\B H \\^ )t) C 2 " e n + \\B H \\ 0>tn>f) e n t k < s < t k+1 . (70) 

Using (70) in (68), we deduce that there existe a constant C that depends on b, ip and xq such 
that 



I 1 n <C(l + \\B H \\l tntt) x (e n + \\B H \\^ C * e n + \\B H \\ ^ h e n 
<c (e n + llfl^Ho,^ 4 + H^llE!,, # 4 + \\B H \\^2 e n 



The almost-sure convergence of l\ to will follow from a Borel-Cantelli argument. Indeed for 
any 77 > and for an integer q that will be chosen later, it holds that 



P( 



(141 > V) < %n + $V(\\B H \\lJ +^ p e f Edl^H^) 



7] V 



+t n ^^ n E(\\B H \\^y 



and by (65) (see Remark 7 in Appendix A) we obtain 

P(\I n \ >V)<-U + # + t to ^ ^fr+UUMO + ^K+P) 4 ^ 2 +P)(*-W 

<^( f ? + +9(H-F)) , qrf) f9 (j>+l)if-gf) , <? /(? (m 2 +p)/A 
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Since tZ = n and e n = n~^~ l )h , we have £ n >i P(|4| > v) < f (Si + S 2 + S 3 + S A ) with 

51 = Yl n g( 7 -l)/7 

n>l 

52 = S n g(fi 7 -H)/7 

n>l 

5 3 = ^ ng (f,7-(p+l)H)/7 ^ 

n>l 

54 = ^ n g(7-l-(m 2 +p)//)/7 ' 



n>l 



It is supposed that 7 > 1 + (m 2 -\-p)H. We choose f) close to H in such a way that (57 — iT > 0. 
Moreover, since 7 > p + 1, one may choose F) such that it satisfies additionally jH > jt) > 
(p + Now it is clear that we may find an integer q in such a way that the three above 

sums converge. The Borel-Cantelli lemma yields that l\ converges to almost-surely. □ 
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